AITopics | input gradient highlight discriminative feature

Collaborating Authors

input gradient highlight discriminative feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Do Input Gradients Highlight Discriminative Features?

Neural Information Processing SystemsDec-23-2025, 18:48:33 GMT

Post-hoc gradient-based interpretability methods [Simonyan et al., 2013, Smilkov et al., 2017] that provide instance-specific explanations of model predictions are often based on assumption (A): magnitude of input gradients--gradients of logits with respect to input--noisily highlight discriminative task-relevant features. In this work, we test the validity of assumption (A) using a three-pronged approach:1. We develop an evaluation framework, DiffROAR, to test assumption (A) on four image classification benchmarks. Our results suggest that (i) input gradients of standard models (i.e., trained on original data) may grossly violate (A), whereas (ii) input gradients of adversarially robust models satisfy (A).2. We then introduce BlockMNIST, an MNIST-based semi-real dataset, that by design encodes a priori knowledge of discriminative features. Our analysis on BlockMNIST leverages this information to validate as well as characterize differences between input gradient attributions of standard and robust models.3. Finally, we theoretically prove that our empirical findings hold on a simplified version of the BlockMNIST dataset. Specifically, we prove that input gradients of standard one-hidden-layer MLPs trained on this dataset do not highlight instance-specific signal coordinates, thus grossly violating assumption (A).Our findings motivate the need to formalize and test common assumptions in interpretability in a falsifiable manner [Leavitt and Morcos, 2020].

assumption, input gradient highlight discriminative feature, name change, (5 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Do Input Gradients Highlight Discriminative Features?

Neural Information Processing SystemsOct-9-2024, 13:10:00 GMT

assumption, dataset, input gradient highlight discriminative feature, (1 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.79)

Technology: Information Technology > Artificial Intelligence (0.59)

Add feedback

Do Input Gradients Highlight Discriminative Features?

Shah, Harshay, Jain, Prateek, Netrapalli, Praneeth

arXiv.org Artificial IntelligenceFeb-25-2021

Interpretability methods that seek to explain instance-specific model predictions [Simonyan et al. 2014, Smilkov et al. 2017] are often based on the premise that the magnitude of input-gradient -- gradient of the loss with respect to input -- highlights discriminative features that are relevant for prediction over non-discriminative features that are irrelevant for prediction. In this work, we introduce an evaluation framework to study this hypothesis for benchmark image classification tasks, and make two surprising observations on CIFAR-10 and Imagenet-10 datasets: (a) contrary to conventional wisdom, input gradients of standard models (i.e., trained on the original data) actually highlight irrelevant features over relevant features; (b) however, input gradients of adversarially robust models (i.e., trained on adversarially perturbed data) starkly highlight relevant features over irrelevant features. To better understand input gradients, we introduce a synthetic testbed and theoretically justify our counter-intuitive empirical findings. Our observations motivate the need to formalize and verify common assumptions in interpretability, while our evaluation framework and synthetic dataset serve as a testbed to rigorously analyze instance-specific interpretability methods.

gradient, input gradient, input gradient highlight discriminative feature, (14 more...)

arXiv.org Artificial Intelligence

2102.12781

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.04)
Asia > India (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback